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Amendments to the Claims 

1 . (currently amended) In a computer system having a large disk resident data set, a 
method of analyzing the disk resident data set using a patient rule induction method 
(PRIM), the method comprising steps of: 

(a) receiving a meta parameter and a relational data table comprised of continuous 
attributes, discrete attributes and a cost attribute, wherein the cost attribute represents 
cost output values based on continuous attributes values and discrete attribute values 
as inputs; 

(b) defining a hyper-rectangle enclosing a multi-dimensional space defined by the 
continuous attribute values and the discrete attribute values, wherein the continuous 
attribute values and the discrete attribute values are represented as points within the 
multi-dimensional space , further including steps ofi ffill 

(i) separating the data into at least three lists, including a continuous attribute 
list for each continuous attribute containing the continuous attribute values, a 
discrete attribute list containing the discrete attributes and the discrete attribute 
values, and a cost attribute list containing the cost output values; 

(ii) adding a label to each of the continuous attribute lists, the discrete 
attribute list and the cost attribute list, such that the label is an index of a tuple to 
which the respective attribute value belongs within the relational data table; 

(iii) sorting the continuous attribute lists based on a continuous attribute value 
in each row of the continuous attribute lists; and 

(iv) adding a label to the cost list, such that the label is a cost flag that 
indicates whether the tuple containing the cost output value is enclosed within 
the hyper-rectangle, such that the cost flag is initially set to one; 

(c) removing a plurality of points along edges of the hyper-rectangle based on an 
average of the cost output value from the plurality of points until a count of the points 
enclosed within the hyper-rectangle equals the meta parameter; and 

(d) adding removed discrete attribute value points and the continuous attribute 
value points along the edges of the hyper-rectangle until a sum of the cost output values 
over the multi-dimensional space enclosed by the hyper-rectangle changes. 
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2. (currently amended) The method of claim 1, wherein the continuous attributes are 
A c continuous attributes, the discrete attributes are discrete attributes and the meta 

parameter is Pq, further wherein the data of step (b) (i) is separated into A c + 2 lists, 

such that at least A c continuous attribute lists are formed, furth e r i nc l ud i ng st e ps of: 

(b)(i) separat i ng th e data into A^ + 2 l ists, such that a l ist i s g e n e rat e d for 

oaoh cont i nuous attribut e to form Ag cont i nuous attr i bute li sts conta i ning tho 

cont i nuous attr i but e va l u e s, a d i scr e t e attr i buto l ist conta i ning tho A4 discrote 

attr i but e s and th e discr e t e attr i but e va l u e s and a cost attribut e li st conta i n i ng th e 
cost output va l u e s; 

(b)(i i ) add i ng a l ab el to e ach of th e A € cont i nuous attr i but e li sts, th e 

d i scroto attr i but e l i st and th e cost attr i buto li st, such that th e lab e l i s an index of a 
tup l o to which th e r e sp e ct i v e attributo va l uo be l ongs w i thin th e relat i ona l data 
tatelef 

(b)( iii ) sorting th e Aq cont i nuous attr i but e lists bas e d on a cont i nuous 

attr i but e va l u e i n e ach row of th e Ag cont i nuous attr i but e li sts; and 

(b)(iv) adding a l abo l to tho cost l ist, such that tho l abo l i s a cost flag that 

i nd i cat e s wh e th e r th e tup le conta i n i ng th e cost output va l u e i s e nc l os e d w i th i n 
tho hypor rectang l e, such that th e cost flag is in i t i ally sot to ono. 

3. (original) The method of claim 2, wherein, step (c) further including steps of: 

(c)(i) determining the discrete attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value; 

(c)(ii) determining the continuous attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value; 

(c)(iii) comparing the lowest average cost output value determined in step (c)(i) 
with the lowest average cost output value determined in step (c)(ii) to determine an 
attribute with the lowest average cost output value; and 
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(c)(iv) removing all continuous attribute value points and all discrete attribute 
value points of the tuples containing the attribute determined in step (c)(iii) from the 
hyper-rectangle; and 

(c)(v) repeating steps (c)(i) to (c)(iv) until the count of the points within the 
hyper-rectangle equals 0q. 

4. (original) The method of claim 3, wherein, step (c)(i) further including steps of: 

1) generating Ad discrete histograms, one for each discrete attribute containing the 

discrete attribute value and an average of the cost output value for each tuple 
containing the discrete attribute value; and 

2) comparing the average cost output value for each discrete attribute value to 
determine the discrete attribute value with the lowest average cost output value. 

5. (original) The method of claim 4, wherein step 1 ) further includes a step of: 
assigning a code to the discrete attribute list, wherein the discrete attribute list is 

sorted based on the assigned code in order to optimize step (c), such that the discrete 
attribute values are ground together according to their discrete attribute. 

6. (original) The method of claim 3, wherein, each continuous attribute list is sorted in 
increasing order with a start pointer to a first row in each continuous attribute list, and a 
second continuous attribute list is sorted in decreasing order with an end pointer to a 
first row in each second continuous attribute list, step (c)(ii) further including steps of: 

1 ) marking a start cutoff value in each of the A c continuous attribute lists based on a 

count of the tuples containing the discrete attribute value determined in step (c)(i) and 
enclosed within the hyper-rectangle; 

2) marking an end cutoff value in each of the second continuous attribute lists 
based on the count of the tuples containing the discrete attribute value determined in 
step (c)(i) and enclosed within the hyper-rectangle; 

3) determining a start cost average value for each continuous output value between 
the start pointer and the cutoff value for each of the A c cost attribute lists; 
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4) determining an end cost average value for each continuous output value 
between the cutoff value and the end pointer for each of the second cost attribute lists; 

5) generating A c continuous histograms, each containing the continuous attribute 

and the average cost output value, wherein the average cost output value is a lesser of 
the start cost average value and the end cost average value; and 

6) comparing the average cost output value for each continuous histogram to 
determine the continuous attribute value with the lowest average cost output value. 

7. (original) The method of claim 6, wherein the discrete attribute list is sorted based 
on an assigned code thereby grouping the discrete attribute values according to their 
discrete attribute, step (c)(iv) further includes steps of: 

when the attribute determined in step (c)(iii) is a continuous attribute, 

1) when the start cost average value is less than the end cost average value, 
setting the cost flag equal to zero for each continuous attribute value between the start 
pointer and the start cutoff value using the index of the continuous attribute value to 
reference the cost flag, 

2) setting the start pointer equal to the start cutoff value; 

3) when the end cost average value is less than the start cost average value, 
setting the cost flag equal to zero for each continuous attribute value between the end 
cutoff value and the end pointer the using the index of the continuous attribute value to 
reference the cost flag, 

4) setting the end pointer equal to the end cutoff value; 

when the attribute determined in step (c)(iii) is a discrete attribute; and 

5) setting the cost flag equal to zero for each discrete attribute value equal to the 
attribute determined in step (c)(iii) using the index of the discrete attribute value to 
reference the cost flag. 

8. (original) The method of claim 2, wherein a total cost output is a sum of the cost 
output values over the multidimensional space enclosed by the hyper-rectangle 
following step (c), and the cost attribute list includes a cost counter, step (d) further 
including steps of: 
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(d)(i) for each tuple with the cost flag set to zero, incrementing the cost counter 
for each point belonging to the tuple that is not enclosed with the hyper-rectangle; 

(d)(ii) determining the discrete attribute value outside of the points enclosed by 
the hyper-rectangle with a highest average cost output value; 

(d)(iii) determining the continuous attribute value outside of the points enclosed 
by the hyper-rectangle with a highest average cost output value; 

(d)(iv) comparing the highest average cost output value determined in step (d)(ii) 
with the highest average cost output value determined in step (d)(iii) to determine an 
attribute with the highest average cost output value; 

(d)(v) decrementing the cost counter for all continuous attribute value points and 
all discrete attribute value points belonging to the tuples containing the attribute 
determined in step (d)(iv), such that attributes with the cost counter equal to zero are 
enclosed within the hyper-rectangle; and 

(d)(vi) repeating steps (d)(i) to (d)(v) until a sum of the cost output value over the 
plurality of points enclosed by the hyper-rectangle is less than the total cost output. 

9. (original) The method of claim 8, wherein, step (d)(ii) further including steps of: 

1) generating discrete histograms, one for each discrete attribute containing the 

discrete attribute value and an average of the cost output value for each tuple 
containing the discrete attribute value; and 

2) comparing the average cost output value for each discrete attribute value to 
determine the discrete attribute value with the highest average cost output value. 

10. (original) The method of claim 8, wherein the A c continuous attribute lists are sorted 

in increasing order and each contain a start pointer and an end pointer, such that the 
continuous attribute values there between are enclosed within the hyper-rectangle, step 
(d)(iii) further including steps of: 

1) sorting the continuous attribute values in the A c continuous attributes lists 

between a first row in the A c continuous attributes lists and the start pointer in 

decreasing order; 
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2) marking a start cutoff value in each of the A c continuous attribute lists based on a 

count of the tuples containing the discrete attribute value determined in step (d)(ii) and 
enclosed within the hyper-rectangle; 

3) marking an end cutoff value in each of the A c continuous attribute lists based on 

the count of the tuples containing the discrete attribute value determined in step (d)(ii) 
and enclosed within the hyper-rectangle; 

4) determining a start cost average value for each continuous output value between 
the start pointer and the cutoff value for each of the A c continuous attribute lists; 

5) determining an end cost average value for each continuous output value 
between the end pointer and the cutoff value for each of the A c continuous attribute 

lists; 

6) generating A c continuous histograms, each containing the continuous attribute 

and the average cost output value, wherein the average cost output value is a greater of 
the start cost average value and the end cost average value; and 

7) comparing the average cost output value for each continuous histogram to 
determine the continuous attribute value with the highest average cost output value. 

1 1 . (original) The method of claim 10, wherein the discrete attribute list is sorted based 
on an assigned code thereby grouping the discrete attribute values according to their 
discrete attribute, step (d)(v) further includes steps of: 

when the attribute determined in step (d)(iv) is a continuous attribute, 

1 ) when the start cost average value is less than the end cost average 
value, decrementing the cost counter for each continuous attribute value 
between the start cutoff value and the start pointer using the index of the 
continuous attribute value to reference the cost counter; 

2) setting the start pointer equal to the start cutoff value; 
when the attribute determined in step (d)(iv) is a discrete attribute; and 

3) decrementing the cost counter for each discrete attribute value 
equal to the attribute determined in step (d)(iv) using the index of the discrete 
attribute value to reference the cost counter. 
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12. (original) In a parallel architecture computer system having a large disk resident 
data set, a method of analyzing the disk resident data set in parallel using a patient rule 
induction method (PRIM), comprising steps of: 

(a) receiving a relational table of data comprised of A c continuous attributes, 

discrete attributes, a meta parameter f}§ and a cost attribute, wherein the cost attribute 

represents cost output values based on continuous attributes values and discrete 
attribute values as inputs; 

(b) defining a hyper-rectangle enclosing a multi-dimensional space defined by the 
continuous attribute values and the discrete attribute values, wherein the continuous 
attribute values and the discrete attribute values are represented as points within 
multi-dimensional space; 

(c) separating the data into A c + 2 lists, such that a list is generated for each 

continuous attribute to form A c continuous attribute lists containing the continuous 
attribute values, a discrete attribute list containing the A^ discrete attributes and the 

discrete attribute values and a cost attribute list containing the cost output values; 

(d) sorting the A c continuous attribute lists in parallel among a plurality of 

processors based on a continuous attribute value in each row of the A c continuous 
attribute lists; 

(e) striping the A c continuous attribute lists and the discrete attribute lists across the 

plurality of processor, wherein each processor contains a copy of the cost attribute list; 

(f) removing a plurality of points along edges of the hyper-rectangle using reduction 
and a one to all broadcast based on an average of the cost output value from the 
plurality of points until a count of the points enclosed within the hyper-rectangle equals 
the meta parameter; and 

(g) adding removed discrete attribute value points and the continuous attribute 
value points along the edges of the hyper-rectangle using reduction and a one to all 
broadcast until a sum of the cost output values over the multi-dimensional space 
enclosed by the hyper-rectangle changes. 
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13. (original) The method of claim 12, wherein step (c) further including steps of: 

(c)(i) adding a label to each of the A c continuous attribute lists, the discrete 

attribute list and the cost attribute list, such that the label is an index of a tuple to which 
the respective attribute belongs within the relational data table; 

(c)(ii) adding a label to the cost list, such that the label is a cost flag that 
indicates whether the tuple containing the cost output value is enclosed within the 
hyper-rectangle, such that the cost flag is initially set to one. 

14. (original) The method of claim 13, wherein, step (f) further including steps of: 

(f)(i) determining the discrete attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value 
using reduction; 

(f)(ii) determining the continuous attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value 
using reduction; 

(f)(iii) comparing the lowest average cost output value determined in step (f)(i) 
with the lowest average cost output value determined in step (f)(ii) to determine an 
attribute with the lowest average cost output value; 

(f)(iv) removing all continuous attribute value points and all discrete attribute 
value points of the tuples containing the attribute determined in step (c)(iii) from the 
hyper-rectangle by setting the cost flag to zero using the index of the attribute to 
reference the cost flag in the cost list contained in each of the plurality of processors 
using the one to all broadcast; and 

(f)(v) repeating steps (f)(i) to (f)(iv) until the count of the points within the 
hyper-rectangle equals /?q. 

15. (original) The method of claim 14, wherein a total cost output is a sum of the cost 
output values over the multidimensional space enclosed by the hyper-rectangle 
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following step (f), and the cost attribute list includes a cost counter, step (g) further 
including steps of: 

(g)(i) for each tuple with the cost flag set to zero, incrementing the cost counter 
for each point belonging to the tuple that is not enclosed with the hyper-rectangle 
among each of the plurality of processors; 

(g)(ii) determining the discrete attribute value outside of the points enclosed by 
the hyper-rectangle with a highest average cost output value using reduction; 

(g)(iii) determining the continuous attribute value outside of the points enclosed 
by the hyper-rectangle with a highest average cost output value using reduction; 

(g)(iv) comparing the highest average cost output value determined in step (g)(ii) 
with the highest average cost output value determined in step (g)(iii) and determine 
which attribute with the highest average cost output value; 

(g)(v) decrementing the cost counter for all continuous attribute value points and 
all discrete attribute value points belonging to the tuples containing the attribute 
determined in step (g)(iv) using the index of the attribute to reference the cost flag in the 
cost list contained in each of the plurality of processors using the one to all broadcast, 
such that attributes with the cost counter equal to zero are enclosed within the 
hyper-rectangle; and 

(g)(vi) repeating steps (g)(i) to (g)(v) until a sum of the cost output value over the 
plurality of points enclosed by the hyper-rectangle is less than the total cost output. 

16. (original) In a symmetric multi-processor architecture computer system 
having a large disk resident data set, a method of analyzing the disk resident data set in 
parallel using a patient rule induction method (PRIM), comprising steps of: 

(a) receiving a relational table of data comprised of A c continuous attributes, A d 

discrete attributes, a meta parameter /Sq and a cost attribute, wherein the cost attribute 

represents cost output values based on continuous attributes values and discrete 
attribute values as inputs; 

(b) defining a hyper-rectangle enclosing a multi-dimensional space defined by the 
continuous attribute values and the discrete attribute values, wherein the continuous 
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attribute values and the discrete attribute values are represented as points within 
multi-dimensional space; 

(c) separating the data into A c + 2 lists, such that a list is generated for each 

continuous attribute to form A c continuous attribute lists containing the continuous 
attribute values, a discrete attribute list containing the discrete attributes and the 

discrete attribute values and a cost attribute list containing the cost output values; 

(d) sorting the A c continuous attribute lists in parallel based on a continuous 

attribute value in each row of the A c continuous attribute lists; 

(e) striping the A c continuous attribute lists and the discrete attribute lists over a 

plurality of portions of a shared disk, wherein each portion of the shared disk contains a 
copy of the cost attribute list; 

(f) removing a plurality of points along edges of the hyper-rectangle using reduction 
and a one to all broadcast based on an average of the cost output value from the 
plurality of points until a count of the points enclosed within the hyper-rectangle equals 
the meta parameter; and 

(g) adding removed discrete attribute value points and the continuous attribute 
value points along the edges of the hyper-rectangle using reduction and a one to all 
broadcast until a sum of the cost output values over the multi-dimensional space 
enclosed by the hyper-rectangle changes. 

17. (original) The method of claim 16, wherein, step (c) further including steps of: 

(c)(i) adding a label to each of the A c continuous attribute lists, the discrete 

attribute list and the cost attribute list, such that the label is an index of a tuple to which 
the respective attribute belongs within the relational data table; 

(c)(ii) adding a label to the cost list, such that the label is a cost flag that 
indicates whether the tuple containing the cost output value is enclosed within the 
hyper-rectangle, such that the cost flag is initially set to one. 
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18. (original) The method of claim 17, wherein, step (f) further including steps of: 

(f)(i) determining the discrete attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value 
using reduction; 

(f)(ii) determining the continuous attribute value enclosed within the plurality of 
points along an edge of the hyper-rectangle with a lowest average cost output value 
using reduction; 

(f)(iii) comparing the lowest average cost output value determined in step (f)(i) 
with the lowest average cost output value determined in step (f)(ii) to determine an 
attribute with the lowest average cost output value; 

(f)(iv) removing all continuous attribute value points and all discrete attribute 
value points of the tuples containing the attribute determined in step (f)(iii) from the 
hyper-rectangle by setting the cost flag to zero using the index of the attribute to 
reference the cost flag in the cost list contained in each of the plurality of processors 
using the one to all broadcast; and 

(f) (v) repeating steps (f)(i) to (f)(iv) until the count of the points within the 
hyper-rectangle equals 0q. 

19. (original) The method of claim 18, wherein a total cost output is a sum of the cost 
output values over the multidimensional space enclosed by the hyper-rectangle 
following step (f), and the cost attribute list includes a cost counter, step (g) further 
including steps of: 

(g) (i) for each tuple with the cost flag set to zero, incrementing the cost counter 
for each point belonging to the tuple that is not enclosed with the hyper-rectangle 
among each of the plurality of portions of the shared disk; 

(g)(ii) determining the discrete attribute value outside of the points enclosed by 
the hyper-rectangle with a highest average cost output value using reduction; 

(g)(iii) determining the continuous attribute value outside of the points enclosed 
by the hyper-rectangle with a highest average cost output value using reduction; 
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(g)(iv) comparing the highest average cost output value determined in step (g)(ii) 
with the highest average cost output value determined in step (g)(iii) and determine 
which attribute with the highest average cost output value; 

(g)(v) decrementing the cost counter for all continuous attribute value points and 
all discrete attribute value points belonging to the tuples containing the attribute 
determined in step (g)(iv) using the index of the attribute to reference the cost flag in the 
cost list contained in each of the plurality of processors using the one to all broadcast, 
such that attributes with the cost counter equal to zero are enclosed within the 
hyper-rectangle; and 

(g)(vi) repeating steps (g)(i) to (g)(v) until a sum of the cost output value over the 
plurality of points enclosed by the hyper-rectangle is less than the total cost output. 
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